1 Introduction

In this tutorial, we’ll explore how to use R Studio to make basic maps that allow us to visualize cross-national variation in transgender rights. We will be using data from the Trans Rights Indicator Project (TRIP), a new cross-national time-series dataset of transgender rights across the world that has been developed and published by the political scientist Myles Williamson. The TRIP dataset and codebook can be downloaded from the project’s website. The corresponding paper that discusses the data at greater length was published in the Political Science journal Perspectives on Politics, and is entitled “A Global Analysis of Transgender Rights: Introducing the Trans Rights Indicator Project (TRIP)”. Here is the citation information for that paper:

Williamson, Myles. 2024. “A Global Analysis of Transgender Rights: Introducing the Trans Rights Indicator Project (TRIP)”. Perspectives on Politics 22(3): 799-818. https://doi.org/10.1017/S1537592723002827

In the tutorial, we will recreate the categorical maps in Figure 3 and Figure 4 of that paper, which display information about whether countries across the world allow for legally recognized gender transitions in 2000 and 2021, respectively. We’ll then create a map that shows whether countries have passed broad-based anti-discrimination laws that protect the transgender community and their rights. After that, we’ll make another map of an overall index of transgender rights that aggregates information in the dataset into a comprehensive index of how “trans friendly” a country’s laws and policies are in the year 2021. Finally, we’ll conclude with an exercise in which you will be invited to select a transgender right of interest from the dataset, and create your own map of cross-national variation in protections for that right.

2 Preliminaries

2.1 Download data and set working directory

Please download the transgender rights datasets and associated codebook from the TRIP website to your computer. You should download the materials to a directory that you created specifically for this workshop. Note that the when downloaded, the file names contain spaces; file names with spaces can cause problems when reading them into R, so please modify the file name of the specific dataset we’ll work with, i.e. “Trip Scores.xlsx” so that it doesn’t have any spaces; the most straightforward option would be to change the file name to “TripScores.xlsx”.

Once you’ve downloaded the materials from TRIP into a directory dedicated to this workshop and changed the file name of the dataset we’ll work with to remove spaces (“Trip Scores.xlsx” to “TripScores.xlsx”), please set this directory as your working directory in R. Essentially, a working directory is the location on your computer where R will look for files to read in, as well as the location where it will export files from R. To check your current working directory, you can use the getwd() function, which will print the file path to your console. If you are familiar with the concept of a file path, you can pass the path to the directory which contains the workshop data as an argument to the setwd() function in order to set it as your working directory, i.e. setwd("filepath"). However, if you are unfamiliar with the idea of the working directory, it would be easier to set your working directory using the R Studio menu. To do so, click the Session menu, scroll down to Set Working Directory, and then click Choose Directory. You will then be taken to a menu where you can select the directory which you would like to designate as your working directory.

2.2 Install packages

R is an open-source programming language for statistical computing that allows users to carry out a wide range of data analysis and visualization tasks (among other things). One of the big advantages of using R is that it has a very large user community among social scientists and statisticians, who frequently publish R packages. One might think of packages as workbooks of sorts, which contain a well-integrated set of R functions, scripts, data, and documentation; these “workbooks” are designed to facilitate certain tasks or implement given procedures. These packages are then shared with the broader community, and at this point, anyone who needs to accomplish the tasks to which the package addresses itself can use the package in the context of their own projects. The ability to use published packages considerably simplifies the work of applied social scientists using R; it means that they rarely have to write code entirely from scratch, and can build on the code that others have published in the form of packages. This allows applied researchers to focus on substantive problems, without having to get too bogged down in complicated programming tasks.

In the context of this tutorial, generating maps of transgender rights based on a published tabular dataset would be quite complex if we had to write all our code from scratch. However, because we are able to make use of mapping and visualization packages written by other researchers, the task is considerably simpler, and will not require any complicated programming.

In order to process our data and make our maps, we will use a variety of packages. They are:

  • sf: The sf package allows us to work with spatially explicit data in R. In particular, it allows us to work with “sf” objects. “sf” stands for “simple features”, and is a data structure that is able to store and display geospatial vector data. Vector data is one of the two main types of GIS data (along with raster data), and we’ll discuss it in more detail next class.
  • tmap: The tmap package will allow us to create and customize our map. It has become one of the major R mapping packages in use today. There are other package that facilitate mapping in R. One prominent alternative to tmap, which many of you may be familiar with, is ggplot2. However, while tmap uses very similar syntax to ggplot2, the former is a dedicated mapping package, while the latter is a more general-purpose visualization package. In my experience, tmap is slightly more intuitive and user-friendly, so we will use it instead of ggplot2 in this tutorial.
  • rnaturalearth and rnaturalearthdata: In order to visualize our data on a world map, we need a spatial dataset of the world’s country boundaries. One way to get such a dataset is to download it from a public repository, and then load it into R Studio. However, these packages allow us to load a spatial dataset of world boundaries into R Studio as sf objects without having to actually download anything, which effectively saves us a few steps in the workflow.
  • tidyverse: The tidyverse is a suite of data-science packages (ggplot2, mentioned above, is actually a part of the tidyverse) that provide useful functions to implement common data science/data analysis tasks. We’ll use some tidyverse packages to clean up our data, subset data, implement table joins etc.

To install a package in R, pass the name of the package (within quotation marks) to the install.packages() function. For example, let’s say you don’t have tmap installed. You can install it with the following:

# Installs tmap packages
install.packages("tmap")

A function is essentially a programming construct that takes a specified input, runs this input (called an “argument”) through a set of procedures, and returns an output. In the code block above, the name of the package we wanted to install (here, “tmap”) was enclosed within quotation marks and passed as an argument to install.packages; this effectively downloaded the tmap package to your computer.

Repeat that process for any packages you don’t have installed.

2.3 Load libraries

After all the packages are downloaded, we must load them into memory. We can think of the process of loading installed packages into a current R environment as analogous to opening up an application on your phone or computer after it has been installed (even after an application has been installed, you can’t use it until you open it!). To load (i.e. “open”) an R package, we pass the name of the package we want to load as an argument to the library() function. Below, we load all of the required packages into memory:

# Loads required libraries
library(WDI)
library(sf)
library(tmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(tidyverse)
library(readxl)

At this point, the packages are loaded and ready to go! One important thing to note regarding the installation and loading of packages is that we only have to install packages once; after a package is installed, there is no need to subsequently reinstall it, except in particular circumstances (for instance, if you update or reinstall R on your computer). However, we must load the packages we need (using the library function) every time we open a new R session. In other words, if we were to close R Studio at this point and open it up later, we would not need to install these packages again, but would need to load the packages again (3.5).

Note that the codeblocks in this tutorial usually have comments, prefaced by a hash (“#”). When writing code in R (or any other command-line interface) it is good practice to preface one’s code with brief comments that describe what a block of code is doing. Writing these comments can allow someone else (or your future self) to read and quickly understand the code more easily than otherwise might be the case. The hash before the comment effectively tells R that the subsequent text is a comment, and should be ignored when running a script If one does not preface the comment with a hash, R wouldn’t know to ignore the comment, and would throw an error message.

Finally, before proceeding, we will use the following code to disable spherical geometries within the sf package, which will allow us to map our data with the tmap package.

# disable spherical geometries
sf_use_s2(use_s2 = F)
## Spherical geometry (s2) switched off

2.4 Object assignment

Before proceeding it is useful to briefly consider the concept of object asssignment, which will make the subsequent sections easier to follow. Consider the following example:

# assign value 5 to new object named x
x<-5

In the code above, we used R’s assignment operator (<-, i.e. a left-facing arrow) to assign the value 5 to an object named “x.” Now that an object named “x” has been created and assigned the value 5, printing “x” in our console (or printing “x” in our script and running it) will return the value 5:

# Print value assigned to object "x"
x
## [1] 5

More generally, the process of assignment effectively equates the output created by the code on the right side of the assignment operator (<-) to an object with a name that is specified on the left side of the assignment operator. Whenever we want to look at the value assigned to an object (i.e. the output created by the code to the right side of the assignment operator), we simply print the name of the object in the R console (or print the name and run it within a script).

While the example above was very simple, we can assign virtually any R code, and by extension, the data structure(s) generated by that code (such as datasets, maps, graphs) to an R object. Indeed, we’ll use the basic principle of object assignment introduced above to assign the datasets we’ll import below to new objects. Note that object names are arbitrary and could be virtually anything, but it is good practice for object names to describe their contents. If the concept of object assignment is new, it will begin to make more sense as we go.

Now that we’ve taken care of these preliminary steps, let’s go ahead and load our data into R Studio. Below, we’ll first load in a spatial dataset of world boundaries, and then read in our World Bank dataset using the WDI package

3 Load, explore, and process the spatial dataset of country boundaries

Before turning again to the global dataset of transgender rights from TRIP (which we introduced earlier), we will first load a spatial dataset of country boundaries into R, and learn how to work with such datasets. After that, we’ll return to the TRIP data, and learn how to join this data to the spatial dataset of country boundaries, which will us to subsequently visualize the TRIP data on a global map.

3.1 Load the spatial dataset of country boundaries into R and assign it to a new object

When working with spatial data in R, we will sometimes want to import data that is stored on our computer. There are several functions in the sf package that will allow us to easily import saved or downloaded spatial data into R; the most commonly used sf package function to load saved spatial vector data into R is the st_read() function. For more details, please consult the st_read() function’s documentation by typing ?st_read().

In our case, however, we won’t have to download and import the spatial data we need into R Studio from our computer’s local drive. That is because there are R packages that already provide this spatial data, and allow us to directly load it into memory. In particular, we’ll use the ne_countries() function of the rnaturalearth package to bring a spatial dataset of country borders into our R environment, and then assign it to an object that we will name country_boundaries:

# Brings spatial dataset of country boundaries into R environment using the rnaturalearth package, and then assigns this spatial dataset to an object named "country_boundaries"
country_boundaries<-ne_countries(scale="medium", returnclass="sf")

Note the two arguments we pass to the ne_countries function: the “scale” argument specifies that we want to use a medium scale when rendering the map (the other options are ‘small’ and ‘large’), while the “returnclass” argument specifies that we want the spatial dataset as an sf object.

3.2 Explore the spatial dataset

Now that we have our spatial dataset of country boundaries loaded into our R environment and assigned to the new country_boundaries object, let’s open up this dataset and see what it looks like. The best way to view a dataset in R studio is to pass the name of the relevant object to the View() function, which will open up the dataset in R Studio’s built-in data viewer.

# View "country_boundaries" data in R Studio Data Viewer
View(country_boundaries)

By scrolling across the dataset, you’ll note that each row corresponds to a country, and that there are many columns that correspond to various country-level attributes. The crucial column, however, which makes this a spatial dataset (as opposed to merely a tabular one), is the information contained in the column labeled “geometry”. This column contains geographic coordinate information that essentially defines a polygon for each country in the dataset. Note that the “geometry” column is likely one of the last columns in dataset, so you may have to scroll a bit to find it.

To observe the information in the “geometry” column more clearly, we can extract that specific column. The dollar sign ($) is the R operator that allows us to extract a specified column; below, we are extracting the “geometry” column from the dataset assigned to the country_boundaries object:

# Extracts "geometry" column from country_boundaries
country_boundaries$geometry
## Geometry set for 241 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## Geodetic CRS:  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 5 geometries:
## MULTIPOLYGON (((-69.89912 12.452, -69.8957 12.4...
## MULTIPOLYGON (((74.89131 37.23164, 74.84023 37....
## MULTIPOLYGON (((14.19082 -5.875977, 14.39863 -5...
## MULTIPOLYGON (((-63.00122 18.22178, -63.16001 1...
## MULTIPOLYGON (((20.06396 42.54727, 20.10352 42....

Note that extracting the “geometry” column prints some useful metadata; it tells us that the dataset has 241 features, and that it represents spatial information as polygons (geometry type: MULTIPOLYGON). It also provides information on the dataset’s coordinate reference system (“CRS”). Roughly speaking, coordinate reference systems provide information on how actual locations on the Earth correspond to points on a two-dimensional map. They are a crucial concept to understand when carrying out geospatial analysis, but we won’t go into coordinate reference systems in detail, since you won’t need an in-depth understanding of them for basic cartography. For now, what is important to notice is that the “geometry” column is comprised of multiple geographic coordinates for each row (which corresponds to a distinct country); we can use this information in the “geometry” column to draw georeferenced polygons for each country/row in the spatial dataset, which will yield a world map!

3.3 Use information from the “geometry” column to render a map

To translate the information in the “geometry” column of the dataset into a cartographic representation, we’ll use the tmap package. In particular, we’ll use the tm_shape and tm_polygons functions from tmap, which are connected by a plus sign (+). The argument passed to the tm_shape function is the name of the object associated with the spatial dataset (country_boundaries, defined above). In addition, the tm_polygons function indicates that the spatial data is to be represented using polygons (as opposed to alternatives such as lines or points), and does not require any arguments (we’ll add some optional arguments to customize the map’s appearance in just a bit). When we type in and run the following code from our script, the result is a map that is rendered based on the information in the “geometry” column of country_boundaries:

# maps geographic features (i.e. countries) of "country_boundaries" as polygons using tmap package functions
tm_shape(country_boundaries)+
  tm_polygons()

If you don’t like the grey polygons, you can specify a desired color within the tm_polygons() function. For guidance on working with colors in R (including information on color and palette codes), see this extremely useful R Color Cheatsheet, by Melanie Frazier.

For example, let’s say we want to draw the polygons in the color associated with “darkorange” on the cheat sheet. We can use the following:

# maps geographic features (i.e. countries) of "country_boundaries" as polygons using tmap package functions; polygons rendered in "darkorange"
tm_shape(country_boundaries)+
  tm_polygons("darkorange")

Or, say we prefer the color associated with the label “cadetblue2”:

# Maps country polygons from "country_boundaries" in "cadetblue2"
tm_shape(country_boundaries)+
  tm_polygons("cadetblue2")

Just as we can assign datasets or numeric values to objects, so too with maps. For example, let’s say we want to assign the orange world map we generated above to an object named world_map_orange:

# assigns dark orange world map to object named "world_map_orange"
world_map_orange<-tm_shape(country_boundaries)+
                      tm_polygons("darkorange")

Now, whenever we want to bring up that particular map, we can simply print the name of the object, and the map will render in the “Plots” tab of the R Studio interface (on the bottom-right of the screen):

# prints contents of "world_map_orange"
world_map_orange

3.4 Make an interactive map

One of the nice things about tmap is that it allows us to toggle back and forth between static print maps, and dynamic interactive maps that allow users to zoom in/out, pan around, view attribute characteristics etc. All you have to do to generate an interactive map is use the tmap_mode() function to shift into “view” mode with the following:

# set tmap mode to "view"
tmap_mode("view")
## tmap mode set to interactive viewing

Now, our tmap code outputs will yield a dynamic map:

# prints contents of "world_map_orange" in "view" mode
world_map_orange

This map can easily be saved as an html document, and subsequently embedded on a website.

If we want to shift back to a static map, simply switch back to “plot” mode via the same tmap_mode function:

# set tmap mode to "plot"
tmap_mode("plot")
## tmap mode set to plotting

Now, our tmap code will once again yield a static representation of the spatial information embedded in country_boundaries:

# prints contents of "world_map_orange" in print mode
world_map_orange

3.5 Edit spatial datasets

We can edit spatial datasets in R Studio with relative ease, using functions from commonly-used data science packages from the tidyverse. Let’s say, for example, that we don’t want Antarctica to appear on our map (since Antarctica typically does not appear on political maps of the world).

To delete Antarctica from the map, we first need to delete the row that corresponds to Antarctica in country_boundaries. We can do so with the following code:

# Deletes Antarctica from "country_boundaries"
country_boundaries_modified<-country_boundaries %>% filter(iso_a3 !="ATA")

We can translate the code above into ordinary language as follows: “Take the existing country boundaries dataset (country_boundaries to the left of the %>% and to the right of the assignment operator) and then (%>%, a symbol called a pipe, which is used to chain together code) select only the countries that are not Antarctica (filter(iso_a3 !="ATA"). Take this amended (sans Antarctica) spatial dataset, and assign it back to a new object named country_boundaries_modified (country_boundaries_modified<-).

Two things may require additional elaboration:

  • First is the pipe, the symbol that looks like this: %>%. The pipe operator essentially takes the output of the code on its left, and then use that output as an input to the code on its right. Here, the pipe takes the country_boundaries spatial object on its left, and then feeds this data into the filter() function on its right. In other words, the pipe operator links the code on its two sides, and establishes that the data to be “filtered” within the filter function is country_boundaries.
  • The filter() function is a function from the dplyr package that allows one to select rows from a dataset using specified criteria. In our case, we want to select all rows from the dataset that are not Antarctica. The argument passed to the filter function, iso_a3 !="ATA", is essentially saying “return any records where the”iso_a3” variable (i.e. the 3 digit ISO country code) in the attribute table is NOT equal to “ATA” (Antarctica’s code). Note that != is R syntax for “not equal to”. If we were to instead type filter(iso_a3==“ATA), the function would only select the Antarctica row from the dataset and discard everything else.

Now, let’s go ahead and map the revised country_boundaries object:

# maps updated "country_boundaries_modified" object
tm_shape(country_boundaries_modified)+
  tm_polygons()

Notice that Antarctica is no longer mapped, since the Antarctica record is not in the country_boundaries_modified object that contains the underlying data.

However, Antarctica is still in the country_boundaries object, which we can confirm with the following:

# maps "country_boundaries" object
tm_shape(country_boundaries)+
  tm_polygons()

4 Replicate published maps

4.1 Read in the TRIP dataset on transgender rights

Now that we loaded and explored our world map, it’s time to read in the TRIP dataset on transgender rights into our R environment and assign it to an object. You should have already downloaded the data to a dedicated workshop directory, changed the filename of the dataset we’ll be working with, and set the dedicated workshop directory as your R working directory. At this point, we can use the read_excel() function (since the dataset is an Excel file) to read in the “TRIPScores.xlsx” dataset into R and assign it to an object, which we’ll name trips:

# read in the "TRIPScores.xlsx" Excel file from the working directory into R Studio using the "read_excel()" function and assign it to a new object named "trips" 
trips<-read_excel("TRIPScores.xlsx") 
View(trips)

4.2 Create a Map of GMC Laws in 2000

# filter by year to get 2000 data
trips_2000<-trips %>% filter(year==2000)
# Joins "trips_2000" to "country_boundaries" using 3-digit ISO codes; these ISO codes are contained in a column named "iso_a3" in "country_boundaries", and "country_text_id" in "trips_2000"; the product of the join is assigned to a new object that is named "trips_2000_spatial"
trips_2000_spatial<-left_join(country_boundaries, trips_2000,
                                    by=c("iso_a3"="country_text_id"))
# replicates 2000 gmc map from paper (figure 3)
gmc_2000_map_replication<-tm_shape(trips_2000_spatial)+
                tm_polygons(col="gmc",
                            style="cat",
                            title="",
                            palette=c("grey90", "grey70"),
                            colorNA="white",
                            textNA="No Data",
                            labels=c("Not possible/specified", "Possible, de jure"))+
                tm_layout(frame=FALSE,
                          legend.outside=TRUE,
                          legend.text.size=0.6,
                          main.title="National Laws allowing legal gender marker change, 2000",
                          main.title.size = 0.8,
                          main.title.position = 0.2,
                          inner.margins=c(0.06, 0.1, 0.1, 0.08))
# prints "gmc_2000_map_replication"
gmc_2000_map_replication

Recall that we can also make an interactive map using tmap. Let’s make an interactive version of trips_2000_spatial. To do so, we’ll first change the tmap mode to “view” within the tmap_mode() function:

# changes tmap mode to "view"
tmap_mode("view")
## tmap mode set to interactive viewing

Now, we can simply print the name of the gmc_2000_map_replication object and the map assigned to it will appear in interactive mode:

# makes interactive version of "gmc_2000_map_replication"
gmc_2000_map_replication

Before continuing, we’ll return to “plot” mode so that subsequent maps will appear as static maps:

# returns to "plot" mode
tmap_mode("plot")

4.2.1 Create a Map GMC Laws in 2021

Based on what you learned in the previous section, see if you can create a map of 2021 gender marker change laws (gmc); in other words, see if you can replicate Figure 4 in the paper that describes the TRIP dataset. Your final product would look something like this:

4.2.2 Modify the color scheme of the published map(s)

As an additional exercise, see if you can modify the color scheme in the map you created above. Remember to consult the R Color Cheat Sheet. What colors did you choose to represent the categories and why?

5 Create a new map that displays an overall index of transgender rights in 2021

transgender_rights_overall_map<-tm_shape(trips_2021_spatial)+
                                    tm_polygons(col="trip_score",
                                                title="TRIP Score",
                                                breaks=c(0, 3, 8.01, 13.01),
                                    labels=c("Minimal Protections", "Moderate Protections", "Robust Protections"),
                                    palette=c("lightgreen", "mediumpurple", "purple4"),
                                    colorNA="white",
                                    textNA="No Data")+
                        tm_layout(frame=FALSE,
                          legend.outside=TRUE,
                          legend.text.size=0.6,
                          main.title="Legal Protections for Transgender Rights",
                          main.title.size = 0.8,
                          main.title.position = 0.2,
                          inner.margins=c(0.06, 0.1, 0.1, 0.08))            
# prints "transgender_rights_overall_map"
transgender_rights_overall_map

6 Export and save maps

Finally, once we have made our map(s) in R Studio, and everything looks satisfactory, we’ll want to export them, so that they can be shared, embedded in papers etc.

One easy way to export your maps is to use the “Export” button within the “Plots” window of your R Studio interface. Once you click on the “Export” button, things are fairly self-explanatory; you click through a few menus, and can save (to a specified location on your computer) the map displayed in the “Plots” window as a PDF or as an image file.

If you prefer to export your map programmatically (which may be preferable, from a reproducibility standpoint), you have a few options. The easiest programmatic option is to use the tmap_save function, which is a part of tmap. For example, recall the map assigned to gmc_2000_map_replication, which was a replication of Figure 3 in the paper; if you need to refresh your memory, it looks like this:

gmc_2000_map_replication

Now, let’s export this map using the following code:

# exports map assigned to "trade_map_2015_custom_breaks" object to working directory as PDF file
tmap_save(tm=gmc_2000_map_replication, 
          filename="trip_gmc_map_2000.pdf", 
          width=1920, 
          height=1080)
## Map saved to /Users/adra7980/Documents/git_repositories/taw_mapping/exported_maps/trip_gmc_map_2000.pdf
## Size: 6.388889 by 3.597222 inches

Above, the first argument to tmap_save is the name of the map object we want to export (gmc_2000_map_replication), the second argument is the file name we want to use for the exported file (along with the desired extension), and the “width” and “height” arguments specify the dimensions of the exported map. When this code is run, the map is exported to your working directory. Note that if we wanted our map as an image file, we could have simply specified a different file extension (i.e. .png instead of .pdf). It’s best to experiment with different parameters for the “width” and “height” arguments until you get the exported map looking the way you want. There are other potential arguments to tmap_save() that allow you to further customize your exported map; we won’t review them here, but you can learn more by looking at the function’s documentation by typing?tmap_save(). If you want to export a map with an inset, you will need to specify the name of the inset map object, and the viewport specifications.

7 Appendix

Solution to Exercise in Section 5.1

To create the map in Section 5.1 (which is a replication of the map in Figure 4 of the TRIPS paper), you could use the following code.

# filter by year to get 2021 data
trips_2021<-trips %>% filter(year==2021)

# Joins "trips_2021" to "country_boundaries" using 3-digit ISO codes; these ISO codes are contained in a column named "iso_a3" in "country_boundaries", and "country_text_id" in "trips_2021"; the product of the join is assigned to a new object that is named "trips_2021_spatial"
trips_2021_spatial<-left_join(country_boundaries, trips_2021,
                                    by=c("iso_a3"="country_text_id"))


# replicates gmc map from paper (figure 4)
gmc_2021_map_replication<-tm_shape(trips_2021_spatial)+
                tm_polygons(col="gmc",
                            style="cat",
                            title="",
                            palette=c("grey90", "grey70"),
                            colorNA="white",
                            textNA="No Data",
                            labels=c("Not possible/specified", "Possible, de jure"))+
                tm_layout(frame=FALSE,
                          legend.outside=TRUE,
                          legend.text.size=0.6,
                          main.title="National Laws allowing legal gender marker change, 2021",
                          main.title.size = 0.8,
                          main.title.position = 0.2,
                          inner.margins=c(0.06, 0.1, 0.1, 0.08))